Banner

The Pacific Northwest¶

Welcome to this journey, where we embark on a fascinating journey to explore the intersection of nature and advanced image and signal processing techniques, all set against the backdrop of the beautiful Pacific Northwest. Our exploration begins with a deep dive into the world of bird calls, specifically focusing on the Flicker Bird. We'll use autocorrelation and cross-correlation methods to analyze the periodicity of its calls, offering insights into the unique patterns of nature's own music.

As we progress, our will shift to the majestic brown bear. Here, we implement Thresholding-based Segmentation, utilizing Otsu's Method, to intricately segment an image of a brown-bear family. This approach not only demonstrates the power of basic thresholding techniques in wildlife scenarios but also serves as a gateway to understanding complex biological structures.

Further, we apply Color-based Thresholding in a real-world scenario to segment trees in a suburban neighborhood. This example highlights the practical applications of our the experiment in urban planning and environmental monitoring, particularly in assessing the health of urban greenery.

Lastly, we explore the Histogram of Oriented Gradients (HOG) Keypoint Descriptor, a tool in image processing. By applying this technique, we attempt to match various viewpoints of Seattle's iconic Space Needle, showcasing the robustness and versatility of HOG in object recognition and image matching.

Table of contents

  • The Pacific Northwest
    - Getting Our Tools Ready
    • Correlation in Signals
      • Flicker Bird Call
      • Autocorrelation
        • What Autocorrelation does Mathematically
        • Expectation of Result
        • Measuring periodicity numerically
      • Finding A Sample via Cross-Correlation
        • Defining Cross-Correlation
        • What are the differences to Autocorrelation?
        • Expectation of Result
      • Robustness of Cross-Correlation
        • Cross-Correlation in a noisy environment
        • Cross-Correlation with a more narrow window
        • Expected Result
        • Conclusion & Learnings
    • Segmentation, morphological operations, object attributes in images
      • Threshold-based Segmentation: Segmenting a Family of Brown Bears
        • Step 1: Enhancing the Contrast
        • Step 2: Binarizing the Image
        • Step 3: Getting rid of the fragments
        • Step 4: Cut out each bear separately
        • Conclusion & Learnings
      • Skeletonizing A Bear's Segment
        • What Skeletonization does
      • Color-based Thresholding: Measuring Your Neighborhoods Health of Greenery
        • Step 1: Defining Bounding Colors
        • Step 2: Preprocessing in HSV
        • Step 3: Applying Morphological Operations
        • Step 5: Label Trees individually
        • Measuring the Trees' Properties
      • Skeletonizing a Tree's Segmentation
    • Keypoint Matching
      • Preprocessing: Resizing and Cropping Images
      • What's contained in Histogram of Oriented Gradients?
      • Calculating HOGs
      • Matching the Preprocessed Space Needles
      • Looking at special cases
        • Matching HOG in Rotation
        • Lighting Conditions
        • Obstructing Objects
        • How Image Size influences Performance
        • Conclusion on HOG

Getting Our Tools Ready¶

Loading Tools

We start by loading up everything we need for our roadtrip through the pacific northwest.

The following tools will help us navigate and explore through the aspects of this beautiful region:

In [ ]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import librosa
import time

from skimage.morphology import skeletonize, closing, square, remove_small_objects, disk
from skimage.measure import label, regionprops, perimeter
from skimage.filters import threshold_otsu
from skimage.segmentation import clear_border

from scipy.ndimage import distance_transform_edt
from scipy.spatial.distance import cdist
from scipy.signal import find_peaks

from IPython.display import Audio

IMAGE_PATH = 'images/'
SOUND_PATH = 'sounds/'

Correlation in Signals¶

In the first exercise we will take a stop in the Olympic National Park and look for the Flicker Woodpecker. With its very distinctive call its going to fit perfectly for the following analyses.

Flicker

This exercise focuses on signal analysis using autocorrelation and cross-correlation methods.

Flicker Bird Call¶

  • What: For this first task we are going to utilize a common type of woodpecker's chirping calls, namely the Flicker. The Flicker, specifically the Northern Flicker, holds a notable place in the ecosystem and culture of the Pacific Northwest. This bird, recognized by its distinctive plumage and loud call, is commonly found across the region's diverse habitats, from forests to urban areas. It plays a vital ecological role, particularly in controlling insect populations, and is also a significant bird in the cultural lore and art of various Indigenous peoples in the Pacific Northwest.
  • Why:
    • Distinctive Patterns: Bird calls, like those of flicker birds, often have unique and repetitive patterns. These patterns make them ideal for analysis through autocorrelation, as the technique aims to identify and analyze repeating motifs within a signal.
    • Periodicity: The periodic nature of bird calls, with distinct intervals between chirps or sequences of chirps, lends itself well to being analyzed by an autocorrelogram, which can visually display the periodicity of these patterns.
    • Variability for Cross-Correlation: Bird calls can vary slightly in pitch, tone, or duration, providing a natural test case for cross-correlation analysis. By extracting a segment of the call and then finding it in the larger signal, we can explore how minor variations affect the ability to detect and match the segment within the original signal.

Source of sound: https://www.youtube.com/watch?v=YQ2wIOcO7aE

In [ ]:
flicker, flicker_sr = librosa.load(SOUND_PATH + 'flicker.mp3')

plt.figure(figsize=(10, 4))
plt.title('Flicker Sound Wave')
plt.plot(flicker)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

display(Audio(flicker, rate=flicker_sr))
No description has been provided for this image
Your browser does not support the audio element.

As we can see, the signal shows a lot of peaks and valleys in consistent time differences.

Autocorrelation¶

Autocorrelation is a method used to determine the relationship between a signal and a delayed version of itself over various time intervals.

What Autocorrelation does Mathematically¶

Mathematically, autocorrelation computes the correlation of a signal with itself at different time lags. It involves calculating the correlation coefficient, which measures the degree of similarity between the original signal and a time-shifted version of the same signal.

  1. Correlation Computation: The function uses np.correlate(y, y, mode='full') from the numpy library to compute the autocorrelation of the input signal y. In mathematical terms, this step calculates the convolution of the signal y with itself, which is the integral (or sum, in the discrete case) of the product of y and a time-shifted version of y. The 'full' mode ensures that the convolution is computed over the entire range of shifts.

  2. Normalization: Normalization: The line autocorr /= autocorr.max() normalizes the autocorrelation sequence. This step scales the autocorrelation values so that the maximum value in the sequence is 1.

  3. Lag Limitation: The function limits the range of lags (time shifts) it considers to a user-defined max_lag. It creates an array lags that represents the discrete time lags (from 0 to max_lag - 1). The line autocorr = autocorr[:max_lag] trims the autocorrelation sequence to only include values up to this maximum lag. This focuses the analysis on a specific range of time shifts - in our case we don't need to lag the signal across its own entire number of samples to illustrate its periodicity.

In [ ]:
def calculate_sound_autocorrelation(y, sr, max_lag):
    autocorr = np.correlate(y, y, mode='full')
    autocorr = autocorr[autocorr.size // 2:]
    
    autocorr /= autocorr.max()
    
    lags = np.arange(0, max_lag)
    autocorr = autocorr[:max_lag]

    return lags, autocorr

Expectation of Result¶

When we took a look at the Flicker's Chirping Signal we noted down that it already shows periodic peaks and valleys. Now that we know how the autocorrelation technique works, let's note down a few expectations:

  • The correlation should be high whenever the signal overlaps a peak with another peak or a valley with another valley
  • It should be low whenever the overlapping shapes don't match well
In [ ]:
lags, autocorr = calculate_sound_autocorrelation(flicker, flicker_sr, flicker_sr)

plt.figure(figsize=(10, 4))
plt.plot(lags, autocorr)
plt.xlabel('Lag (samples)')
plt.ylabel('Autocorrelation')
plt.title('Autocorrelation of Flicker Bird Call')
plt.show()
No description has been provided for this image

As we can see, at 0 lags the correlation is 1 - This is when the signals practically overlap each other at no time shift. When we start to shift the signal in time the correlation starts to decrease.

Another observation we can make is the periodic bulges in the signal; These signal the overlapping peaks and valleys of different chirps.

Let's visualize and explore these findings and validate them numerically.

Measuring periodicity numerically¶

To put a number to our observation of periodicity we can visualize the highest peaks in the autocorrelation.

In [ ]:
peak_samples, peak_correlations = find_peaks(autocorr, height=0.03, distance=1000)

To find the peaks I used find_peaks from scipy.signal. This method returns the indices in the signal of the peaks.

Chosen Parameters:

  • height = 0.03: I found the normalized correlation of 0.03 as the best descriptor for the peaks as all of these bulges at least go up to that value while the valleys don't reach that value.
  • distance = 1000: The distance of 1000 makes sure that find_peaks doesn't detect multiple peaks in one single bulge.
In [ ]:
plt.figure(figsize=(10, 4))
plt.plot(lags, autocorr)
plt.vlines(peak_samples, -1, 1, color='r')
plt.ylabel('Autocorrelation')
plt.xlabel('Lag (samples)')
plt.title('Periodicity of Flicker Bird Call in the Autocorrelation')
plt.show()
No description has been provided for this image

The plot shows, as expected, the peaks of every bulge.

To see if our theory with the overlaps matches the periodicity we measured with the peaks we can measure the average difference between the peaks and then shift a second replica of the same signal by that average difference.

In [ ]:
avg_peak_diff = np.diff(peak_samples).mean()
flicker_shifted = np.roll(flicker, int(avg_peak_diff))

time_between_peaks = avg_peak_diff / flicker_sr

plt.figure(figsize=(10, 4))
plt.plot(flicker)
plt.plot(flicker_shifted, color='r')
plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.suptitle('Flicker Bird Call', fontsize=16)
plt.title(f'Average Period: {avg_peak_diff:.2f} samples ({time_between_peaks:.2f} seconds)')
plt.legend(['Original', 'Shifted'])
plt.show()
No description has been provided for this image

As we can see, the average difference between each chirp is at about 3364 samples or 0.15 seconds.

Shifted by one chirp, the shifted signal exactly overlaps the original one.

Finding A Sample via Cross-Correlation¶

Before we define the cross-correlation we need to crop out a section of the signal that we would like to find.

In this example we first take the second chirp (2nd peak in the signal) from its neighboring valleys to the peak itself. Therefore, I chose the sample_window from 4500 up to 6000.

In [ ]:
sample_window = (4500, 6000)
flicker_sample = flicker[sample_window[0]:sample_window[1]]

plt.figure(figsize=(10, 4))
plt.plot(flicker)

plt.axvspan(sample_window[0], sample_window[1], alpha=0.3, color='r')
plt.plot(range(sample_window[0], sample_window[1]), flicker[sample_window[0]:sample_window[1]], color='red')

plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.title('Flicker Bird Call Sample')
plt.legend(['Original', 'Sample Window'])
plt.show()
No description has been provided for this image

The plot above shows the window I chose (4500 to 6000) marked in red in the signal.

Choosen Parameters:

  • sample_window = (4500, 6000): The sample window includes a peak and its neighboring values - A wide enough sample to capture the unique structure of this section.

Defining Cross-Correlation¶

Cross-correlation is a measure of the similarity between two different time series as a function of the lag of one relative to the other. Mathematically, it's similar to autocorrelation, but instead of comparing a signal with a delayed version of itself, cross-correlation compares two distinct signals at different lags.

In this exercise I used np.correlate which is numpy's implementation of the cross-correlation, but here's the breakdown of how we can achieve the cross-correlation of a signal:

  1. You take two signals, let's say x(t) and y(t) (in this case, both of these signals would be the Flicker call)
  2. For each lag (time shift) k, you shift y(t) by k units of time
  3. Then, you multiply the shifted y(t + k) by x(t) at each point and sum all the products
  4. The sum then can be interpreted as the cross-correlation at that particular lag

This process gets repeated for varius lags (until all lags/samples have been correlated) to construct the entire cross-correlation function

What are the differences to Autocorrelation?¶

Both the autocorrelation and cross-correlation deal with correlation of signals accross time lags so there might be some confusion, here are the key differences:

  • Autocorrelation is the correlation of a signal with itself at various time shifts, whereas cross-correlation is between two different signals
  • Autocorrelation can reveal the internal structure of a single signal, often used to detect periodicity or seasonality. Cross-correlation is used to find the time delay (lag) between two signals, often to identify if and when one signal might be leading or lagging the other
  • For autocorrelation, a peak at zero lag is guaranteed since a signal is always perfectly correlated with itself with no delay. In cross-correlation, a peak (which can be positive or negative) at a non-zero lag indicates that one signal is similar to the other but shifted in time

Expectation of Result¶

  • When correlating the chosen sample_window we should get a distinctive peak in the correlogram that goes up to 1
  • As described, the window captures enough samples to describe the uniqueness of the given chirp so the cross-correlation shouldn't have too much issues detecting the correct position
In [ ]:
cross_corr = np.correlate(flicker, flicker_sample, mode='full')

cross_corr /= cross_corr.max()

detected_peak = np.argmax(cross_corr)

plt.figure(figsize=(10, 4))
plt.plot(cross_corr)
plt.vlines(detected_peak, -1, 1, alpha=.5, color='r', linestyle='--')
plt.xlabel('Lag (samples)')
plt.ylabel('Cross-correlation')
plt.suptitle('Cross-correlation of Flicker Bird Call Sample')
plt.title(f'Peak at {detected_peak + 1} samples')
plt.show()
No description has been provided for this image

As expected, the detected peak at a cross-correlation of 1 emerges at the second overall peak in the correlogram.

To validate this observation, we now can try to mark the window again by using the detected maximum-correlation (with cross_corr.argmax()).

The nature of the argmax()-Function yields one sample before the right end of the window, thus we have to add +1 to the index.

In [ ]:
detected_end = cross_corr.argmax() + 1

sample_length = sample_window[1] - sample_window[0]

detected_start = detected_end - sample_length

plt.figure(figsize=(10, 4))
plt.plot(flicker)
plt.axvspan(detected_start, detected_end, alpha=0.3, color='r')
plt.plot(range(detected_start, detected_end), flicker[detected_start:detected_end], color='red')
plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.suptitle('Flicker Bird Call Detected Region')
plt.title(f'Detected Sample Region: {detected_start} to {detected_end}')
plt.legend(['Flicker Bird Call', 'Detected Sample Region'])
plt.show()

assert detected_end == sample_window[1] and detected_start == sample_window[0], 'Incorrect detected region'
No description has been provided for this image

The plot shows that we have been able to correctly identify the sample using cross-correlation.

Robustness of Cross-Correlation¶

To see how the cross-correlation behaves in more difficult scenarios where the signal might be altered we explore some edge cases.

Cross-Correlation in a noisy environment¶

In this example we explore what happens if we add noise to the original signal and try to match the same sample window.

In [ ]:
np.random.seed(1337) # for reproducibility

noise = np.random.normal(-1, 1, flicker.shape)
flicker_noisy = flicker + noise

cross_corr_noisy = np.correlate(flicker_noisy, flicker_sample, mode='full')
cross_corr_noisy /= cross_corr_noisy.max()

detected_end = cross_corr_noisy.argmax() + 1

sample_length = sample_window[1] - sample_window[0]

detected_start = detected_end - sample_length

plt.figure(figsize=(8, 10))

plt.subplot(2, 1, 1)
plt.plot(cross_corr_noisy)
plt.xlabel('Lag (samples)')
plt.ylabel('Cross-correlation')
plt.title('Cross-correlation of Flicker Bird Call Sample with Noise')
plt.grid(True)

plt.subplot(2, 1, 2)
plt.plot(cross_corr)
plt.xlabel('Lag (samples)')
plt.ylabel('Cross-correlation')
plt.title('Cross-correlation of Flicker Bird Call Sample')
plt.grid(True)
plt.show()

plt.figure(figsize=(10, 4))
plt.plot(flicker_noisy)
plt.axvspan(detected_start, detected_end, alpha=0.3, color='r')
plt.plot(range(detected_start, detected_end), flicker_noisy[detected_start:detected_end], color='red')
plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.title('Flicker Bird Call Detected Region')
plt.legend(['Flicker Bird Call', 'Detected Sample Region'])
plt.show()
No description has been provided for this image
No description has been provided for this image

The resulting plots show that the added noise rose the overall correlation in all samples. The sample window still got detected correctly.

Cross-Correlation with a more narrow window¶

To see how the cross-correlation behaves in a scenario with less information or less structural variance we define a narrow_window with a width of only 500 samples.

In [ ]:
narrow_window = (6500, 7000)
narrow_flicker_sample = flicker[narrow_window[0]:narrow_window[1]]

plt.figure(figsize=(10, 4))
plt.plot(flicker)
plt.axvspan(narrow_window[0], narrow_window[1], alpha=0.3, color='r')
plt.plot(range(narrow_window[0], narrow_window[1]), flicker[narrow_window[0]:narrow_window[1]], color='red')
plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.title('Flicker Bird Call Sample (Narrow)')
plt.legend(['Original', 'Narrow Sample Window'])
plt.show()
No description has been provided for this image

This time, the selected window is in a valley of the signal. If we look at the rest of the signal we can see that there are multiple locations which have about the same structure when it comes to the magnitude of the amplitude.

Expected Result¶

  • Because there are numerous other "valleys" with similar structure I expect the result to have numerous high correlations.
In [ ]:
narrow_cross_corr = np.correlate(flicker, narrow_flicker_sample, mode='full')

narrow_cross_corr /= narrow_cross_corr.max()

detected_narrow_peak = np.argmax(narrow_cross_corr)

plt.figure(figsize=(10, 4))
plt.plot(narrow_cross_corr)
plt.vlines(detected_narrow_peak, -1, 1, alpha=.5, color='r', linestyle='--')
plt.xlabel('Lag (samples)')
plt.ylabel('Cross-correlation')
plt.title('Cross-correlation of Flicker Bird Call (Narrow Sample)')
plt.show()
No description has been provided for this image

As indicated in the plot above, the narrow window was not detected properly with the maximum cross-correlation technique I implemented.

Lets validate the findings by marking the detected section in the original Flicker Call signal.

In [ ]:
detected_narrow_end = detected_narrow_peak + 1

narrow_sample_length = narrow_window[1] - narrow_window[0]

detected_narrow_start = detected_narrow_end - narrow_sample_length

plt.figure(figsize=(10, 4))
plt.plot(flicker)
plt.axvspan(detected_narrow_start, detected_narrow_end, alpha=.3, color='r')
plt.axvspan(narrow_window[0], narrow_window[1], alpha=.3, color='green')
plt.plot(range(narrow_window[0], narrow_window[1]), flicker[narrow_window[0]:narrow_window[1]], color='green')
plt.plot(range(detected_narrow_start, detected_narrow_end), flicker[detected_narrow_start:detected_narrow_end], color='red')
plt.xlabel('Sample')
plt.ylabel('Amplitude')
plt.title('Flicker Bird Call Detected Narrow Region')
plt.legend(['Flicker Bird Call', 'Detected Narrow Sample Region', 'Actual Narrow Sample Region'])
plt.show()
No description has been provided for this image

The plot above shows the narrow window that should have been detected and the one that was detected by the implemented technique. As we can see the visual validation also yielded the false detection that we expected.

Conclusion & Learnings¶

When using cross-correlation to find a signal in a sample, it is important to consider the sample window size.

  • A larger window can lead to a broader peak in the cross-correlation function, which may not accurately represent the true time lag between the signals.
  • On the other hand, a window that is too small may not capture the relevant information in the signals. Therefore, it is essential to carefully choose a sample window size that balances the need to capture relevant signal information with the potential impact on the cross-correlation result

A source that I found helpful and that also stated this observation was https://www.med.upenn.edu/mulab/crosscorrelation.html.

Segmentation, morphological operations, object attributes in images¶

The next stop of our journey is the Mountain Range "Selkirk Mountains". With a Grizzly Population of around 70 bears (Source) we are sure to find a family of bears nice enough to take a photograph to segment and detect the bears later on!

Threshold-based Segmentation: Segmenting a Family of Brown Bears¶

  • What: Brown bears, also known as grizzly bears, have a profound connection to the Pacific Northwest. They are an integral part of the region's ecosystem, symbolizing the wild and natural heritage of this area. Historically, they occupied a range of habitats from coastal to mountainous areas within the Pacific Northwest, and they play a critical role in the food web as apex predators and as agents of nutrient distribution, particularly through their salmon fishing activities.
  • Why:
    • Contrast Differences: Bears captured by a photograph in nature often times stand out visibly when they are on a wildlife path or on a meadow. This will help a lot when switching to the binary space and then applying morphological operations.
    • Complexity of Shapes: Bears on photographs usually come in more complex shapes and not just a simple geometric base-shape - Especially when they are on foot and the legs might overlap on each other. This will create a challenge when it comes to morphological operations.
In [ ]:
img_rgb = cv2.cvtColor(cv2.imread(IMAGE_PATH + 'bears.jpg', cv2.IMREAD_COLOR), cv2.COLOR_BGR2RGB)
img = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.imshow(img_rgb)
plt.axis('off')
plt.title('Original Image')
plt.subplot(122)
plt.imshow(img, cmap='gray')
plt.axis('off')
plt.title('Grayscale Image')
plt.show()
No description has been provided for this image

Step 1: Enhancing the Contrast¶

Goal:

  • The goal is to enhance the contrast as much as possible so that we can maximize the difference between the background (grasslands) and the foreground (bears).

Possible Approaches:

To achieve the goal there are a couple of options:

  1. cv2.convertScaleAbs(): This method is used to adjust the brightness and contrast of an image through a combination of scaling and shifting the pixel values. The mathematical operation performed by this function can be represented as: $$ dst(I) = saturate\_cast(\alpha\cdot src(I) + \beta)$$ where $\alpha$ and $\beta$ are the scaling and shifting parameters respectively. The $\text{saturate\_cast}$ function will cast negative values to the minimum possible value for a datatype and the maximum possible value for positive values.
  2. cv2.addWeighted(): On the other hand, the cv2.addWeighted() function calculates the weighted sum of the same image twice to adjust the contrast and brightness according to where the image already has contrast. The mathematical operation can be represented as: $$ dst(I) = saturate(src_1(I)\cdot\alpha+src_2(I)\cdot\beta+\gamma) $$ where $\alpha$ and $\beta$ are the weights for the input arrays and $\gamma$ is a scalar added to each sum
  3. Histogram Equalization: The Histogram Equalization technique uses the intensity distribution of the image's histogram. It works by spreading out the most frequent intensity values in an image, resulting in a roughly linear cumulative distribution function. This method increases the global contrast of many images, especially when the image is represented by a narrow range of intensity values.
  4. CLAHE: The Contrast Limited Adaptive Histogram Equalization (CLAHE) technique adds another dimension to contrast enhancement. Unlike the previous methods, CLAHE is a non-linear contrast enhancement technique that works on small regions in the image, rather than the entire image at once. It divides the image into small tiles and performs histogram equalization on each tile. This helps in preventing the overamplification of noise in relatively homogeneous regions of the image, which can be a drawback of global histogram equalization. The operations of CLAHE involve the following steps:
    • The image is divided into small blocks called "tiles."
    • A contrast-limited histogram equalization is performed on each of these tiles.
    • The contrast enhancement in each tile is achieved by redistributing the lightness values using the histogram equalization method.
    • Finally, the contrast-enhanced tiles are combined to produce the final image.
In [ ]:
convert_scale_abs = cv2.convertScaleAbs(img, alpha=1.5, beta=1.5)

weighted = cv2.addWeighted(img, 1.5, np.zeros(img.shape, img.dtype), 0, 0)

histogram_eq = cv2.equalizeHist(img)

clahe = cv2.createCLAHE(clipLimit=1.0, tileGridSize=(3,3))
clahe = clahe.apply(img)
img_cont_clahe = cv2.addWeighted(clahe, 1.5, np.zeros(img.shape, img.dtype), 0, 0)

plt.figure(figsize=(20, 10))

plt.subplot(2, 5, 1)
plt.imshow(img, cmap='gray')
plt.axis('off')
plt.title('Original Image')

plt.subplot(2, 5, 2)
plt.imshow(convert_scale_abs, cmap='gray')
plt.axis('off')
plt.title('Convert Scale Abs')

plt.subplot(2, 5, 3)
plt.imshow(weighted, cmap='gray')
plt.axis('off')
plt.title('Weighted')

plt.subplot(2, 5, 4)
plt.imshow(histogram_eq, cmap='gray')
plt.axis('off')
plt.title('Histogram Equalization')

plt.subplot(2, 5, 5)
plt.imshow(img_cont_clahe, cmap='gray')
plt.axis('off')
plt.title('CLAHE')

plt.subplot(2, 5, 6)
plt.hist(img.ravel(), bins=256, range=(0, 255))
plt.title('Original Image Histogram')

plt.subplot(2, 5, 7)
plt.hist(convert_scale_abs.ravel(), bins=256, range=(0, 255))
plt.title('Convert Scale Abs Histogram')

plt.subplot(2, 5, 8)
plt.hist(weighted.ravel(), bins=256, range=(0, 255))
plt.title('Weighted Histogram')

plt.subplot(2, 5, 9)
plt.hist(histogram_eq.ravel(), bins=256, range=(0, 255))
plt.title('Histogram Equalization Histogram')

plt.subplot(2, 5, 10)
plt.hist(img_cont_clahe.ravel(), bins=256, range=(0, 255))
plt.title('CLAHE Histogram')

plt.show()
No description has been provided for this image

The CLAHE Equalization seems to be the most beneficial in our case as it gives the bears enough contrast while almost completely illuminating the foreground (road) and background (meadow) to white pixels and therefore separates the both in an efficient way.

Step 2: Binarizing the Image¶

To build upon the separation process and to reach a point where we will be only able to see the bears to label them we need to reach a mask of the bears itself. The first step towards this result is to get a binarized image. Goal:

  • Binarize the image into 1s and 0s so that the bear's shapes are retained fully and as much from the background is not included.

Thresholding Technique:

  • Otsu's Method: Otsu's Method is used for automatic image thresholding, aiming to separate pixels into two classes, foreground and background, by finding a single intensity threshold that minimizes intra-class intensity variance (pixel intensities within the same class, such as foreground or background). The method works by maximizing the between-class variance to determine the best threshold value for image segmentation.
In [ ]:
thresh = threshold_otsu(img_cont_clahe)
bw = closing(img_cont_clahe > thresh, square(3))

plt.figure(figsize=(15, 4))

plt.subplot(131)
plt.imshow(img_cont_clahe, cmap='gray')
plt.axis('off')
plt.title('Image Before Thresholding')

plt.subplot(133)
plt.hist(img_cont_clahe.ravel(), bins=256)
plt.axvline(thresh, color='r')
plt.legend((f'Otsu Threshold at: {thresh}',))
plt.xlabel('Pixel Intensity')
plt.ylabel('Pixel Count')
plt.title('Pixel Intensity Histogram')

plt.subplot(132)
plt.imshow(bw, cmap='gray')
plt.axis('off')
plt.title('Thresholded Image')

plt.show()
No description has been provided for this image

Otsu's method found 172 to be the best threshold to binarize the image and the result is fairly good. The areas of the four bears seem to have been maximized correctly. The only leftovers are smaller fragments in the foreground and background.

Step 3: Getting rid of the fragments¶

First, before we remove anything we need to invert the binarized image so that the bears get the value 1 assigned and the background 0.

After that we can start to remove the unwanted fragments. The easiest ones to remove are the ones that touch the border of the image using clear_border.

In [ ]:
bw = np.logical_not(bw) # inverts the image

cleared_border = clear_border(bw)

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.imshow(bw, cmap='gray')
plt.axis('off')
plt.title('Thresholded Image')

plt.subplot(122)
plt.imshow(cleared_border, cmap='gray')
plt.axis('off')
plt.title('Border Cleared Image')

plt.show()
No description has been provided for this image

The next step in the fragment removal are the fragments that form smaller shapes, so the approach would be to erode the image.

For this step I found a circular kernel (disk()) with radius 3 to remove the most free-floating fragments:

In [ ]:
cleared_border_arr = np.array(cleared_border, dtype=np.uint8)
cleared_fragments = cv2.erode(cleared_border_arr, disk(3))

plt.figure(figsize=(10, 4))

plt.subplot(131)
plt.imshow(cleared_border, cmap='gray')
plt.axis('off')
plt.title('Before Image')

plt.subplot(132)
plt.imshow(disk(3), cmap='gray')
plt.axis('off')
for (j,i),val in np.ndenumerate(disk(3)):
    plt.text(i,j,val,ha='center',va='center', color='r')
plt.title('Kernel')

plt.subplot(133)
plt.imshow(cleared_fragments, cmap='gray')
plt.axis('off')
plt.title('Cleared Fragments')

plt.show()
No description has been provided for this image

As the plot above shows there are some tradeoffs in this step:

  • Positives:
    • The shadows beneath the bears got reduced or even removed
  • Negatives:
    • Through erosion the small dents and holes in some of the bears got larger

Because of the fact that the bears now have holes we can close the holes again through dilation. The first, third and fourth bear (from left to right) all have hole-spots between their front legs which are overlapping as they're stepping forward and those should not be closed - Therefore I chose a small enough disk with radius of 2 to not affect these negative spots.

In [ ]:
cleared_fragments_arr = np.array(cleared_fragments, dtype=np.uint8)
dilated_mask = cv2.dilate(cleared_fragments_arr, disk(2))

plt.figure(figsize=(10, 4))

plt.subplot(131)
plt.imshow(cleared_fragments, cmap='gray')
plt.axis('off')
plt.title('Before Image')

plt.subplot(132)
plt.imshow(disk(2), cmap='gray')
plt.axis('off')
for (j,i),val in np.ndenumerate(disk(2)):
    plt.text(i,j,val,ha='center',va='center', color='r')
plt.title('Kernel')

plt.subplot(133)
plt.imshow(dilated_mask, cmap='gray')
plt.axis('off')
plt.title('Dilated Bear Mask')

plt.show()
No description has been provided for this image

The mentioned holes got smaller but we can also notice an increase of area on the separate fragments - Which is an obvious result when applying dilation/closing.

In order to not affect these separate fragments anymore, the next goal is to remove them as good as possible.

We can achieve this through eroding the image. Through applying an erosion kernel of disk(3) we can almost diminish all fragments without loosing too much detail on the bear's shapes. If we apply a larger kernel we may lose details on the narrow legs of the bears - which is not what we want.

In [ ]:
dilated_mask_arr = np.array(dilated_mask, dtype=np.uint8)
opened_mask = cv2.morphologyEx(dilated_mask_arr, cv2.MORPH_OPEN, disk(3))

plt.figure(figsize=(10, 4))

plt.subplot(131)
plt.imshow(dilated_mask, cmap='gray')
plt.axis('off')
plt.title('Before Image')

plt.subplot(132)
plt.imshow(disk(3), cmap='gray')
plt.axis('off')
for (j,i),val in np.ndenumerate(disk(3)):
    plt.text(i,j,val,ha='center',va='center', color='r')
plt.title('Kernel')

plt.subplot(133)
plt.imshow(opened_mask, cmap='gray')
plt.axis('off')
plt.title('Opened Bear Mask')

plt.show()
No description has been provided for this image

As expected the small fragments now got removed completely. Now that we reached this milestone we can try to close the unwanted holes again.

The areas of all bears are larger than the unwanted fragments that still stayed on until now so we will be able to remove them later based on their areal size.

In [ ]:
opened_mask_arr = np.array(opened_mask, dtype=np.uint8)
closed_mask = cv2.morphologyEx(opened_mask_arr, cv2.MORPH_CLOSE, disk(3))

plt.figure(figsize=(10, 4))

plt.subplot(131)
plt.imshow(opened_mask, cmap='gray')
plt.axis('off')
plt.title('Before Image')

plt.subplot(132)
plt.imshow(disk(3), cmap='gray')
plt.axis('off')
for (j,i),val in np.ndenumerate(disk(3)):
    plt.text(i,j,val,ha='center',va='center', color='r')
plt.title('Kernel')

plt.subplot(133)
plt.imshow(closed_mask, cmap='gray')
plt.axis('off')
plt.title('Closed Bear Mask')

plt.show()
No description has been provided for this image

The plot above shows how the faulty holes were able to be closed by applying a closing kernel of disk(3).

Now, as mentioned, we can remove the last few objects by utilizing skimage's remove_small_objects()-Function. It removes or sets objects in the binarized image to 0 if they are smaller than a specified min_size, which in this case is 300 pixels. This value has proven to effectively remove the remaining fragments that are not connected to the bears.

After the removal of these fragments we can apply label() which enumerates all connected neighboring 1-values into their own number.

In [ ]:
small_obj_removed = remove_small_objects(np.array(closed_mask, dtype=bool), 300)

labeled_image = label(small_obj_removed)

plt.figure(figsize=(10, 4))

plt.subplot(131)
plt.imshow(closed_mask, cmap='gray')
plt.axis('off')
plt.title('Before Image')

plt.subplot(132)
plt.imshow(small_obj_removed, cmap='gray')
plt.axis('off')
plt.title('Small Objects Removed')

plt.subplot(133)
plt.imshow(labeled_image)
plt.axis('off')
plt.title('Labeled Bear Mask')

plt.show()
No description has been provided for this image

Like expected, only the bear's shapes are now remaining and the label()-Function labeled the background and the four bears into each one color.

Now thanks to this labeled mask we can mark the bears on the original image:

In [ ]:
contour_image = img_rgb.copy()

for region in regionprops(labeled_image):
    minr, minc, maxr, maxc = region.bbox
    
    rect = cv2.rectangle(contour_image, (minc, minr), (maxc, maxr), (255, 0, 0), 2)

plt.figure(figsize=(12, 6))

plt.subplot(131)
plt.imshow(img_rgb)
plt.title('Original Image')
plt.axis('off')

plt.subplot(132)
plt.imshow(labeled_image)
plt.title('Labeled Bear Mask')
plt.axis('off')

plt.subplot(133)
plt.imshow(contour_image)
plt.title('Detected Bears')
plt.axis('off')

plt.show()
No description has been provided for this image

Step 4: Cut out each bear separately¶

To further process the bear segments for later tasks (Skeletonization) and to measure the properties of the bears we crop them out using the code below.

Following measurements were chosen:

  • area: This is a measure of the size of the bear in the image. It can be used to differentiate between bears based on their apparent size in the image, which might correspond to their actual size, age, or distance from the camera. In ecological studies, for example, area measurements can help estimate animal populations or track the growth of individuals.
  • perimeter: This gives an idea of the complexity of the bear's outline or shape. A higher perimeter relative to the area might indicate a more complex or detailed edge, which could be due to the bear's fur texture, posture, or movement. Perimeter measurements can be useful in differentiating species or individuals based on the silhouette.
  • brightness: This metric refers to the average brightness of the pixels within the segmented area of the bear. Brightness can be influenced by the bear's fur color, the lighting conditions, and the presence of shadows. In scenarios where bears need to be detected under varying lighting conditions or where their fur color is of interest (e.g., distinguishing between species or individuals), this metric can be particularly useful.
In [ ]:
bear_images = []
bear_measurements = []

for n_class in np.unique(labeled_image)[1:]: # excluding the background class
    bear_region = labeled_image == n_class
    
    # crop out the tree from the original image
    bear_img = img_rgb * np.expand_dims(bear_region, axis=2)

    # crop the image to the bounding box for the tree
    bear_images.append(bear_img[np.ix_(bear_region.any(1), bear_region.any(0), [0, 1, 2])])

    # measurements
    area = bear_region.sum()
    bear_perimeter = perimeter(bear_region)
    grey_img = cv2.cvtColor(bear_img, cv2.COLOR_RGB2GRAY)
    grey_img = grey_img.flatten()
    grey_img = grey_img[grey_img != 0]
    normalized_img = grey_img / 255.0
    bear_brightness = normalized_img.sum() / area
    
    row = { 
           'bear': n_class, 
           'area': area, 
           'perimeter': bear_perimeter,
           'brightness': bear_brightness
          }
    
    bear_measurements.append(row)

fig, axes = plt.subplots(1, 4, figsize=(10,3))
fig.suptitle('Detected Bears and their Properties', fontsize=16)
for i, ax in enumerate(axes.flat):
    if i < len(bear_images):
        ax.imshow(bear_images[i])
        area = bear_measurements[i]['area']
        bear_perimeter = bear_measurements[i]['perimeter']
        brightness = bear_measurements[i]['brightness']
        
        ax.set_xticks([])
        ax.set_yticks([])
        ax.set_title(f'Bear {i+1}')
        ax.set_xlabel(f'Area: {area}\nPerimeter: {bear_perimeter:.2f}\nBrightness: {brightness:.2f}')
    else:
        ax.axis('off')
plt.tight_layout()
plt.show()
No description has been provided for this image

Now that we look at the segments from our masks, we can see that most bears were segmented without their snout. This was sadly a tradeoff I couldn't nicely solve without fiddling around with local operations that would've made the snout darker, so it didn't get cut off by equalization and thresholding because the snouts of these bears were white.

Another observation is that especially Bear #2 still has quite a large part of background on the segmentation, this could be because that bear actually has the most complex shape in the lower section with the legs - Through the erosion and dilation steps a lot of shape information got lost.

Conclusion & Learnings¶

Traditional Techniques (e.g., Otsu's Thresholding and Morphological Operations) and their problems:

  • Sensitivity to Variability: Traditional methods are often sensitive to changes in lighting, contrast, and noise. For instance, parts of the bears that blend into the background or are not well-illuminated may not be correctly segmented.
  • Lack of Contextual Understanding: These methods operate on a pixel or local neighborhood level without an understanding of the object as a whole. Thus, they can include background regions or miss parts of the object that have similar pixel values to the background.
  • Static Rules: They follow preset rules and thresholds, which may not be robust against the natural variability in wildlife images, such as different animal poses, overlapping objects, or various environments.

To conclude this part, in a real-world-scenario it might be more beneficial to use pretrained models or train a model oneself to make sure the context of the entiere bear is captured.

Skeletonizing A Bear's Segment¶

To skeletonize a segmentation of a bear I chose the segment of Bear #4 because that bear was preserved quite well, even with the snout.

What Skeletonization does¶

The goal of skeletonization is to express an objects shape and orientation through just a single line. This is possible through morphological thinning: This technique involves a repeated cycle of erosion and dilation of the image. The procedure employs a specific structuring element that strategically eliminates pixels from bold parts inwards (from the thick edges) of the object.

In [ ]:
bear_bw = cv2.cvtColor(bear_images[-1], cv2.COLOR_RGB2GRAY)

bear_binarized = bear_bw > 0

bear_skeletonized = skeletonize(bear_binarized)

fig, axes = plt.subplots(1, 3, figsize=(10, 10))
axes[0].imshow(bear_bw, cmap='gray')
axes[0].set_title('Original Image')
axes[1].imshow(bear_binarized, cmap='gray')
axes[1].set_title('Binary Mask')
axes[2].imshow(bear_skeletonized, cmap='gray')
axes[2].set_title('Skeletonized Bear')
plt.tight_layout()
plt.show()

print(f'Pixels in Skeleton: {bear_skeletonized.sum()} pixels')
print(f'Pixels in Original: {bear_binarized.sum()} pixels')
print(f'Ratio: {bear_skeletonized.sum() / bear_binarized.sum():.2f}')
No description has been provided for this image
Pixels in Skeleton: 106 pixels
Pixels in Original: 1877 pixels
Ratio: 0.06

In this skeletonized image, the bear's shape has been reduced to the minimal form that represents its general structure which is representable by just 106 pixels, which is a reduction of 94% of information.

Color-based Thresholding: Measuring Your Neighborhoods Health of Greenery¶

Drone above Suburb

Now off to the suburbs! Just outside of Seattle, Issaquah stands as the home of many lucrative businesses such as Microsoft and Costco - The latter was even founded here. The streets of Issaquah are rich in greenery so a satellite image of this neat suburb will help us out for this next exercise:

  • What: This image shows a suburban neighborhood in Issaquah, Washington from birds-eye perspective.
  • Why:
    • Real World Application: I imagine the segmentation and measurement of the trees in a neighborhood could help cities to monitor the health of the trees that keep the streets, back alleys and cul-de-sacs lively.
    • Color Application: The trees in this aerial shot differ in color from the roofs and street's concrete. Though complexity gets added as not only the trees have the color green but also the sidewalks and front- and backyards of the houses.

Source of image: Google Earth screenshot

Step 1: Defining Bounding Colors¶

Contrary to the first experiment with the segmentation of bears we now can access the property of color to separate the trees from the other objects in the image.

Goal:

  • The goal in the first step is to find colors that segment the trees but not the grass as good as possible.

I chose to perform this Color-based thresholding technique in the HSV color space. When segmenting images based on color, the HSV (Hue, Saturation, Value) color space can often be more intuitive and effective than the RGB (Red, Green, Blue) color space, especially for natural elements like trees. Here's why:

  • Hue-Based Segmentation: The HSV color space separates the color (hue) from the intensity (value), which is not possible in the RGB color space. This means that in HSV, you can target a specific color or range of colors (like the greens of tree leaves) regardless of lighting and shadow effects, which often affect the perceived color in the RGB space.
  • Saturation as a Descriptor: Saturation describes the vibrancy of the color. Trees and vegetation often have a saturated color, which stands out against less saturated objects like buildings or pavements. In RGB, saturation is not explicitly represented, making it harder to filter out less vibrant colors.
  • Better Color Differentiation: In RGB, different colors can have similar red, green, or blue values, making it hard to distinguish them. HSV describes colors in a way that more closely aligns with human perception of colors, which can make it easier to distinguish and segment specific colors in an image.
In [ ]:
aerial = cv2.imread(IMAGE_PATH + 'issaquah_aerial.png')

image_rgb = cv2.cvtColor(aerial, cv2.COLOR_BGR2RGB)
image_hsv = cv2.cvtColor(aerial, cv2.COLOR_BGR2HSV)

lower_green = np.array([30, 30, 0])
upper_green = np.array([80, 255, 255])

mask = cv2.inRange(image_hsv, lower_green, upper_green)
green_areas = cv2.bitwise_and(image_rgb, image_rgb, mask=mask)

plt.figure(figsize=(20, 10))

plt.subplot(1, 2, 1)
plt.imshow(image_rgb)
plt.title('Original Aerial Image')
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(green_areas)
plt.title('Green Areas Highlighted')
plt.axis('off')

plt.show()

lower_green_area = cv2.cvtColor(np.uint8([[lower_green] * 30] * 30), cv2.COLOR_HSV2RGB)
upper_green_area = cv2.cvtColor(np.uint8([[upper_green] * 30] * 30), cv2.COLOR_HSV2RGB)

plt.figure(figsize=(6, 3))

plt.subplot(121)
plt.imshow(lower_green_area)
plt.title('Lower Green Boundary')
plt.axis('off')

plt.subplot(122)
plt.imshow(upper_green_area)
plt.title('Upper Green Boundary')
plt.axis('off')

plt.show()
No description has been provided for this image
No description has been provided for this image

The highlighted green areas that were already filtered with the bounding colors show that pretty much all trees, or at least the bigger ones get separated correctly. The smaller parts of grassy areas on sidewalks and frontyards only show up as smaller fragments so most of them should be possible to get rid off through area-based thresholding.

Choosen Parameters:

  • lower_green = np.array([30, 30, 0])
  • upper_green = np.array([80, 255, 255]) I chose these parameters with an online HSV color picker, to make sure I am trying out different bounding colors in the green part of the HSV space. These chosen greens proved to yield the best result when it comes to the tree/grass tradeoff I enountered.

Step 2: Preprocessing in HSV¶

As in the first step already visually mentioned, the green parts often come in smaller segments and the shapes of the trees are often ragged.

Goal:

  • Smoothen edges of objects by blurring the image
In [ ]:
image_blurred = cv2.GaussianBlur(image_hsv, (11, 11), 0)

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.imshow(image_hsv)
plt.title('Original Image')
plt.axis('off')

plt.subplot(122)
plt.imshow(image_blurred)
plt.title('Blurred Image')
plt.axis('off')

plt.show()
No description has been provided for this image

The image in HSV now shows that the ragged bits got blurred and the shapes are now smoother. This preprocessing step should now make the next steps (morphological operations) easier.

Step 3: Applying Morphological Operations¶

In this step I now separate the trees from the entire image using cv2.inRange. This operation retains the pixels that were defined in the color boundaries. On that binarized mask we then start to remove the small unwanted bits of grass with morphological opening.

Goal:

  • Remove the grass patches
  • Keep the trees in the mask
In [ ]:
mask = cv2.inRange(image_blurred, lower_green, upper_green)

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
mask_opened = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=2)

plt.figure(figsize=(12, 5))

plt.subplot(131)
plt.imshow(mask, cmap='gray')
plt.title('Masked with Color Boundaries')

plt.subplot(132)
plt.title('Opening Kernel')
plt.imshow(kernel, cmap='gray')
for (j, i), value in np.ndenumerate(kernel):
    plt.text(i, j, f'{int(value)}', ha='center', va='center', color='r')

plt.subplot(133)
plt.imshow(mask_opened, cmap='gray')
plt.title('Mask Opened')

plt.show()
No description has been provided for this image

This morphological operation already got rid of most of the grass patches we had before.

The new problem that has arisen is that the bigger trees on the upper street now have unwanted holes, so one might think the next operation needs to be dilation or morphological closing.

A problem that appears with further closing or applying dilation is that we also will start to close the gaps between trees that are near to each other - Essentially resulting in misslabeling two trees as one. To counteract that effect I apply distance_transform_edt() (Euclidean distance transform) to the opened mask from the plot above. This is useful because it provides a way to measure how 'deep' each foreground pixel is within the object. When objects are close to each other but not necessarily connected, the distance transform creates a gradient of values that peaks at the center of objects and decreases towards edges. This gradient can be used to identify individual objects, even in cases where they are close to each other.

In [ ]:
distance = distance_transform_edt(mask_opened)
distance_normalized = distance / distance.max()

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.imshow(mask_opened, cmap='gray')
plt.title('Mask Opened')

plt.subplot(122)
plt.imshow(distance_normalized, cmap='gray')
plt.title('Distance Transform')

plt.show()
No description has been provided for this image

To then get back a binarized mask I use Otsu's Method, which already was described in the Bear's Segmentation process.

This thresholded image then consists of the same trees but with wider gaps between them so that we now can try to dilate the holes inside the trees using that representation:

In [ ]:
thresh = threshold_otsu(distance_normalized)
distance_thresh = distance_normalized > thresh

distance_thresh = cv2.dilate(np.uint8(distance_thresh), kernel, iterations=3)

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.imshow(distance_normalized, cmap='gray')
plt.title('Distance Transform')

plt.subplot(122)
plt.imshow(distance_thresh, cmap='gray')
plt.title('Thresholded Distance Transform')

plt.show()
No description has been provided for this image

The new thresholded distance transform shows a the neighborhoods trees in a better separated way. Based on this binarized mask we can now start to label and remove wanted/unwanted fragments.

Step 5: Label Trees individually¶

In this last step we need to label the trees in the mask into a class of their own and then remove the last few faulty detected objects if there are still some.

To label each tree we again call the label()-Function that assigns a label to each tree.

Goal

  • Label trees
  • Remove unwanted fragments
In [ ]:
markers = label(distance_thresh)

labels_cleaned = remove_small_objects(markers, min_size=630)

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.imshow(markers, cmap='nipy_spectral')
plt.title('Simple Labels')

plt.subplot(122)
plt.imshow(labels_cleaned, cmap='nipy_spectral')
plt.title('Removed Small/Faulty Objects')

plt.show()
No description has been provided for this image

Choosen Parameters:

  • min_size = 630: This minimum size of each fragment cuts off faulty grass patches and unfortunately also some trees. I tried out a wide variety of different sizes in pixel area and found this one to cut off the least amount of trees but the maximum amount of grass patches.

To now visualize the the detected trees we can draw cv2.rectangles on the original RGB image:

In [ ]:
contour_image_watershed = image_rgb.copy()
for region in regionprops(labels_cleaned):
    minr, minc, maxr, maxc = region.bbox
    rect = cv2.rectangle(contour_image_watershed, (minc, minr), (maxc, maxr), (0, 255, 0), 2)

plt.figure(figsize=(12, 5))

plt.subplot(121)
plt.imshow(image_rgb)
plt.title('Original Image')
plt.axis('off')

plt.subplot(122)
plt.imshow(contour_image_watershed)
plt.title('Detected Trees')
plt.axis('off')

plt.tight_layout()
plt.show()

print('There are {} separate components / objects detected after applying the watershed algorithm.'.format(len(np.unique(labels_cleaned))))
No description has been provided for this image
There are 32 separate components / objects detected after applying the watershed algorithm.

As we can see most of the trees remained. The smaller and brighter trees in the back alley either got removed as they didn't meet the min_size area of 630 or they were too bright and therefore were cut off in the process of reducing the grass patches.

In the top right we can also see an example of a tree that still gets labeled in the correct location but not true to its actual size. This might have been due to the morphological operations and Euclidean Distance Transform that reduced the aerial size of that tree.

Measuring the Trees' Properties¶

And again, just like already encountered in the bear segmentation, to further process the tree segments for later tasks (Skeletonization) and to measure the properties of the bears we crop them out using the code below.

Following measurements were chosen:

  • area: Knowing the area covered by each tree or plant can help in assessing the overall green cover. It is an essential indicator of the green space available for ecological services like carbon sequestration, provision of habitat, and contribution to the urban canopy cover. Trees with a larger canopy area can often be indicators of good health and maturity.
  • perimeter: The perimeter provides information about the shape and spread of a plant or tree, which can be critical for understanding growth patterns and identifying any irregularities that might indicate disease or damage.
  • brightness: The brightness of vegetation in aerial images is often associated with the reflectance values captured by sensors, which can be indicative of plant health. Healthy vegetation typically reflects more infrared light and less visible light, leading to a characteristic bright signature in certain wavelengths. By analyzing the brightness, city planners can gauge the health and vigor of the plants. Changes in reflectance can indicate issues such as pest infestation, disease, drought stress, or overwatering.

By analyzing these metrics over time, city planners can:

  • Monitor growth trends and health of urban trees and plants.
  • Plan for urban space management, ensuring that areas with insufficient green cover are identified for potential planting projects.
  • Detect and respond to environmental stresses on plants, such as drought, disease, or pest infestations.
  • Evaluate the effectiveness of urban forestry policies and initiatives.
  • Engage in proactive maintenance and conservation efforts, prioritizing areas that show signs of declining plant health.
In [ ]:
tree_images = []
tree_measurements = []

for n_class in np.unique(labels_cleaned)[1:]: # excluding the background class
    tree_region = labels_cleaned == n_class
    tree_img = image_rgb * np.expand_dims(tree_region, axis=2)

    tree_images.append(tree_img[np.ix_(tree_region.any(1), tree_region.any(0), [0, 1, 2])])

    # Measurements
    area = tree_region.sum()
    tree_perimeter = perimeter(tree_region)
    grey_img = cv2.cvtColor(tree_img, cv2.COLOR_RGB2GRAY)
    grey_img = grey_img.flatten()
    grey_img = grey_img[grey_img != 0]
    normalized_img = grey_img / 255.0
    tree_brightness = normalized_img.sum() / area

    row = { 
           'tree': n_class, 
           'area': area, 
           'perimeter': tree_perimeter,
           'brightness': tree_brightness
          }
    
    tree_measurements.append(row)

fig, axes = plt.subplots(4, 8, figsize=(12, 12))
fig.suptitle('Detected Trees and their Properties', fontsize=16)
for i, ax in enumerate(axes.flat):
    if i < len(tree_images):
        ax.imshow(tree_images[i])
        area = tree_measurements[i]['area']
        perimeter = tree_measurements[i]['perimeter']
        brightness = tree_measurements[i]['brightness']
        
        ax.set_xticks([])
        ax.set_yticks([])
        ax.set_title(f'Tree {i+1}')
        ax.set_xlabel(f'Area: {area}\nPerimeter: {perimeter:.2f}\nBrightness: {brightness:.2f}')
    else:
        ax.axis('off')
plt.tight_layout(w_pad=0.5, h_pad=0.5)
plt.show()
No description has been provided for this image

When looking at the final applied segmentations we can see that the trees were more or less correctly segmented. The problem in this case is that a lot of these trees grew in pairs or triplets that touch each other. I also tried out to apply watershedding to better segment pairs of trees into individual ones but that did not yield better results as these trees didn't differ in color.

Similar to the case with the bears I would much rather explore the implementation of a model that can segment trees based on their shape context. Therefore the Conclusion and Learnings also apply to this experiment.

Skeletonizing a Tree's Segmentation¶

In [ ]:
TREE_TO_INSPECT = 11

tree_bw = cv2.cvtColor(tree_images[TREE_TO_INSPECT-1], cv2.COLOR_RGB2GRAY)

tree_binarized = tree_bw > 0 # binarized version of masked tree

tree_skeletonized = skeletonize(tree_binarized)

# plot the images
fig, axes = plt.subplots(1, 3, figsize=(10, 10))
axes[0].imshow(tree_bw, cmap='gray')
axes[0].set_title('Original Image')
axes[1].imshow(tree_binarized, cmap='gray')
axes[1].set_title('Thresholded Image')
axes[2].imshow(tree_skeletonized, cmap='gray')
axes[2].set_title('Skeletonized Image')
plt.tight_layout()
plt.show()

print(f'Pixels in Skeleton: {tree_skeletonized.sum()} pixels')
print(f'Pixels in Original: {tree_binarized.sum()} pixels')
print(f'Ratio: {tree_skeletonized.sum() / tree_binarized.sum():.2f}')
No description has been provided for this image
Pixels in Skeleton: 104 pixels
Pixels in Original: 5127 pixels
Ratio: 0.02

In this simplified image, the outline of the tree has been stripped down to a basic form that captures its overall framework using only 104 pixels, amounting to a 98% decrease in information.

Keypoint Matching¶

Space Needle

The last destination on this short trip through the PNW is downtown Seattle. Unmissably unique in the city's skyline, the Space Needle stands tall as an icon of the city. Let's try to explore it in its different angles and see if through pure mathematical features and operations we can match the keypoints between different images!

Preprocessing: Resizing and Cropping Images¶

The Histogram of Oriented Gradients (HOG) algorithm in later steps requires to divide images that should be keypoint-matched into bins. To implement this approach in the most straightforward way, I resize the images into a square images using the following method:

In [ ]:
def resize_and_crop(image, size, offset_x=0, offset_y=0, zoom=1):
    zoomed_size = int(size / zoom) # adjusting size based on zoom factor

    height, width = image.shape[:2]
    center_x = min(max(offset_x, zoomed_size // 2), width - zoomed_size // 2)
    center_y = min(max(offset_y, zoomed_size // 2), height - zoomed_size // 2)

    # cropping
    start_x = center_x - zoomed_size // 2
    end_x = center_x + zoomed_size // 2
    start_y = center_y - zoomed_size // 2
    end_y = center_y + zoomed_size // 2
    cropped_image = image[start_y:end_y, start_x:end_x]

    if zoom != 1:
        cropped_image = cv2.resize(cropped_image, (size, size))

    plt.figure(figsize=(12, 6))

    plt.subplot(1, 2, 1)
    plt.title('Original Image with Crop Area')
    plt.imshow(image, cmap='gray')
    plt.gca().add_patch(plt.Rectangle((start_x, start_y), zoomed_size, zoomed_size,
                                      edgecolor='red', facecolor='none', lw=2))
    plt.axis('off')
    plt.subplot(1, 2, 2)
    plt.title('Cropped & Zoomed Image')
    plt.imshow(cropped_image, cmap='gray')
    plt.axis('off')

    plt.show()

    return cropped_image
  • What: The images I chose depict the Space Needle, an iconic landmark located in Seattle, Washington. It's a tower that was built for the 1962 World's Fair and has since become a symbol of the city and a popular tourist attraction.
  • Why:
    • Variability in Viewpoints and Lighting: There are a lot of different images with variant angles, lighting conditions and obstructions available.

To calculate the HOG for every image in a fast way, I crop the images to 256 x 256 pixel images.

In [ ]:
cropped_images = []

image1 = cv2.imread(IMAGE_PATH + 'keypoint_matching/1.jpeg', cv2.IMREAD_GRAYSCALE)
cropped_images.append(resize_and_crop(image1, 256, offset_x=image1.shape[1]//2, offset_y=0, zoom=0.4))

image2 = cv2.imread(IMAGE_PATH + 'keypoint_matching/2.jpeg', cv2.IMREAD_GRAYSCALE)
cropped_images.append(resize_and_crop(image2, 256, offset_x=image2.shape[1]//2-100, offset_y=1200, zoom=0.2))

image3 = cv2.imread(IMAGE_PATH + 'keypoint_matching/3.jpeg', cv2.IMREAD_GRAYSCALE)
cropped_images.append(resize_and_crop(image3, 256, offset_x=image3.shape[1]//2, offset_y=0, zoom=0.4))
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

What's contained in Histogram of Oriented Gradients?¶

The Histogram of Oriented Gradients (HOG) is a feature descriptor usually used in image processing and computer vision for the purpose of object detection. The HOG descriptor focuses on the structure or the shape of an object. It works by first calculating the gradients of the image, which highlight edges and contours. The gradient of an image is a directional change in the intensity or color in the image. Let's break down the HOGs main components:

  • calculate_gradient_of_image Function:
    • Process:
      1. This function uses the Sobel operator to calculate the gradient of the image in both the x (horizontal) and y (vertical) directions (dx and dy respectively).
      2. cv2.Sobel computes the derivative of the image. When the function is applied with the parameters (1, 0), it computes the derivative along the x-axis (horizontal changes), and with (0, 1), it computes the derivative along the y-axis (vertical changes).
  • calculate_magnitude Function:
    • Process:
      1. This function calculates the magnitude of the gradient at each point in the image.
      2. The magnitude is calculated using the Euclidean distance: $\sqrt{dx^2 + dy^2}$. This yields the strength of the gradient at each point.
  • calculate_orientation Function:
    • Process:
      1. This function calculates the orientation of the gradient at each point in the image.
      2. The orientation is calculated using the arctangent of the ratio of the gradients ($\arctan2(dy, dx)$), which yields the angle of the gradient at each point.
In [ ]:
def calculate_gradient_of_image(image, kernel_size=3):
    dx = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=kernel_size)
    dy = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=kernel_size)
    
    return dx, dy

def calculate_magnitude(dx, dy):
    magnitude = np.sqrt(dx**2 + dy**2)
    
    return magnitude

def calculate_orientation(dx, dy):
    angle = np.arctan2(dy, dx)
    
    return angle

def show_hog_steps(image):
    dx, dy = calculate_gradient_of_image(image)
    magnitude = calculate_magnitude(dx, dy)
    orientation = calculate_orientation(dx, dy)
    
    fig, ax = plt.subplots(1, 3, figsize=(12, 6))
    ax[0].imshow(image, cmap='gray')
    ax[0].set_title('Original Image')
    ax[0].axis('off')
    ax[1].imshow(magnitude, cmap='gray')
    ax[1].set_title('Gradient Magnitude')
    ax[1].axis('off')
    ax[2].imshow(orientation, cmap='hsv')
    ax[2].set_title('Gradient Orientation')
    ax[2].axis('off')
    plt.show()
    
for image in cropped_images:
    show_hog_steps(image)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Calculating HOGs¶

After these steps, the HOG descriptor then creates histograms of the orientations of these gradients (hence the name Histogram of Oriented Gradients). These histograms are created over various blocks of the image. The combination of these histograms then forms the feature vector used for keypoint matching.

The implemented calculate_histograms()-function builds the Histogram of Oriented Gradients (HOG) descriptor for a given image. This function takes the magnitude and angle of gradients (computed from the image) and compiles these into histograms within specified cells of the image. Each bin in a histogram (cell) represents the frequency of gradient orientations within a specific range of angles.

In [ ]:
def calculate_histograms(magnitude, angle, cell_size=(4, 4), num_bins=9):
    start = time.time()
    
    bins_range = np.linspace(-np.pi, np.pi, num_bins+1)
    
    height, width = magnitude.shape
    
    cells_x = width // cell_size[0]
    cells_y = height // cell_size[1]
    
    hog = np.zeros((cells_y, cells_x, num_bins))
    
    for cx in range(cells_x):
        for cy in range(cells_y):
            x_start = cx * cell_size[0]
            x_end = x_start + cell_size[0]
            y_start = cy * cell_size[1]
            y_end = y_start + cell_size[1]
            
            cell_magnitude = magnitude[y_start:y_end, x_start:x_end]
            cell_angle = angle[y_start:y_end, x_start:x_end]
            
            # Calculate the histogram for the cell
            cell_hist, _ = np.histogram(cell_angle, bins=bins_range, weights=cell_magnitude)
            
            hog[cy, cx, :] = cell_hist
    
    hog /= hog.max()
    
    end = time.time()
    return hog, end - start

def plot_hog(hog, cell_size=(4, 4), num_bins=9):
    img_height, img_width = hog.shape[0] * cell_size[0], hog.shape[1] * cell_size[1]
    plt.figure(figsize=(8,8))
    plt.xlim([0, img_width])
    plt.ylim([0, img_height])
    plt.gca().invert_yaxis()
    
    bin_centers = np.linspace(0, 2 * np.pi, num_bins, endpoint=False)

    for y in range(hog.shape[0]):
        for x in range(hog.shape[1]):
            cell_magnitude = hog[y, x, :]
            cell_max_mag = np.amax(cell_magnitude)
            if cell_max_mag > 0:
                cell_magnitude = cell_magnitude / cell_max_mag
            for o in range(num_bins):
                angle = bin_centers[o]
                magnitude = cell_magnitude[o]
                dx = magnitude * cell_size[0] * np.cos(angle) / 2.0
                dy = magnitude * cell_size[1] * np.sin(angle) / 2.0
                x1 = (x + 0.5) * cell_size[0] - dx
                y1 = (y + 0.5) * cell_size[1] - dy
                x2 = (x + 0.5) * cell_size[0] + dx
                y2 = (y + 0.5) * cell_size[1] + dy
                plt.plot([x1, x2], [y1, y2], 'r', linewidth=1, alpha=0.6)

    plt.title("HOG Feature Descriptors")
    plt.axis('off')
    plt.show()

Matching the Preprocessed Space Needles¶

The following function match_and_plot_hogs() matches the keypoints only utilizing the HOGs of two images and the euclidean distance.

The function finds keypoints as follows:

  1. Only keep keypoints (histograms) with distinct meaningful orientations, calculated with threshold
  2. Match the keypoints between both images according to cdist (Euclidean Distance)
  3. Put together a combined_image and connect the matching pairs visually
In [ ]:
def match_and_plot_hogs(hog1, image1, hog2, image2, cell_size=(4, 4), threshold=0.7):
    # Identify keypoints in the HOG descriptors
    keypoints1 = np.argwhere(hog1 > hog1.max() * threshold) * cell_size[0]
    keypoints2 = np.argwhere(hog2 > hog2.max() * threshold) * cell_size[1]

    # Match keypoints
    distances = cdist(keypoints1, keypoints2)
    matches = np.argmin(distances, axis=1)

    # Create a new plot
    fig, ax = plt.subplots(figsize=(15, 10))

    # Combine the two images into one for plotting
    combined_image = np.concatenate((image1, image2), axis=1)

    # Plot the combined image
    ax.imshow(combined_image, cmap='gray')

    # Adjust the keypoints of the second image to align with its position in the combined image
    keypoints2[:, 1] += image1.shape[1]

    # Plot keypoints on the combined image
    ax.scatter(keypoints1[:, 1], keypoints1[:, 0], facecolors='none', edgecolors='r')
    ax.scatter(keypoints2[:, 1], keypoints2[:, 0], facecolors='none', edgecolors='r')

    # Connect matching keypoints with lines
    for i, match in enumerate(matches):
        ax.plot([keypoints1[i][1], keypoints2[match][1]], [keypoints1[i][0], keypoints2[match][0]], 'y-')

    plt.title('Matched Keypoints Between Two Images')
    plt.axis('off')
    plt.show()

This next block will calculate and visualize the HOGs for all space needle images in this experiment:

In [ ]:
CELL_SIZE = (4, 4)
NUM_BINS = 9

image_hogs = []

for image in cropped_images:
    dx, dy = calculate_gradient_of_image(image)
    magnitude = calculate_magnitude(dx, dy)
    orientation = calculate_orientation(dx, dy)
    hog, recorded_time = calculate_histograms(magnitude, orientation, cell_size=CELL_SIZE, num_bins=NUM_BINS)
    image_hogs.append(hog)
    plot_hog(hog, cell_size=CELL_SIZE, num_bins=NUM_BINS)
    print('Time to calculate HOG for image: {:.5f} seconds'.format(recorded_time))
No description has been provided for this image
Time to calculate HOG for image: 0.16745 seconds
No description has been provided for this image
Time to calculate HOG for image: 0.13381 seconds
No description has been provided for this image
Time to calculate HOG for image: 0.13395 seconds

What do the "star shapes" on plots mean?

  • Orientation: Each line's orientation corresponds to the direction of the gradient (change in intensity) for a particular patch of the image. It's like an arrow showing which way the gradient is strongest.
  • Length: The length of each line indicates the magnitude of the gradient. A longer line means a stronger gradient, while a shorter line indicates a weaker gradient.
  • Position: The lines are placed in a grid-like pattern over the image, with each cell in the grid corresponding to a local region of the image. The lines within each cell summarize the gradient information for that region.
In [ ]:
reference_image, reference_hog = cropped_images[0], image_hogs[0]

for i in range(1, len(cropped_images)):
    match_and_plot_hogs(reference_hog, reference_image, image_hogs[i], cropped_images[i], cell_size=CELL_SIZE, threshold=0.7)
No description has been provided for this image
No description has been provided for this image

The Keypoint Matching with HOG Features show difficulties in the first matching as the second image's angle differs too much from the first one. When thinking about what the HOG describes this observation makes sense: The histograms in the second image have a completely different direction and magnitude than the first image's histograms.

In the second matching most of the keypoints get matched quite well. Both image's angles and therefore the object itself looks quite the same.

Choosen Parameters:

  • cell_size = (4, 4): I chose a relatively big cell size for performance reasons. This cell size still allows for enough histograms to describe the towers features while not taking too long to calculate. In a later section I will also run performance tests regarding this.
  • n_bins = 9: I chose the 9 bins as a good middle ground to describe enough directions and to get a meaningful number of frequencies for high magnitudes.
  • threshold = 0.7: I played around with the threshold and especially in the second matching the higher threshold of 0.7 was needed as the matching function started to match on the leaves which is not the desired outcome.

Looking at special cases¶

In order to furthermore explore the abilities, advantages and disadvantages of the HOG Feature Descriptor this section will dive into these subjects.

Matching HOG in Rotation¶

As already noted in the section above, whenever we change the angle of the view we have on the object we also change the directions of the histograms in the HOG. The expected output therefore shouldn't be a surprise: We probably wont be able to match any keypoints in a robust way.

In [ ]:
image_rotated = cv2.warpAffine(reference_image, cv2.getRotationMatrix2D((reference_image.shape[1]//2, reference_image.shape[0]//2), 40, 1), (reference_image.shape[1], reference_image.shape[0]))
zoomed_image = resize_and_crop(image_rotated, 256, offset_x=image_rotated.shape[1]//2, offset_y=130, zoom=1.5)
No description has been provided for this image
In [ ]:
rotated_hog, _ = calculate_histograms(*calculate_gradient_of_image(zoomed_image))

match_and_plot_hogs(reference_hog, reference_image, rotated_hog, zoomed_image)
No description has been provided for this image

As expected, the keypoints could not be matched with the rotated image as the histograms don't have much similarity at the right spots.

Lighting Conditions¶

Contrary to the issues with rotation, lighting conditions shouldn't have a negative impact on the ability to match keypoints as the gradients and magnitudes still have the same orientation.

I would expect there to be more matches, though the issue with this picture of the Space Needle at night also is that it has a slightly different angle than the image at daytime.

In [ ]:
img_night = cv2.imread(IMAGE_PATH + 'keypoint_matching/space_needle_night.jpeg', cv2.IMREAD_GRAYSCALE)
img_night_cropped = resize_and_crop(img_night, 256, offset_x=img_night.shape[1]//2, offset_y=1800, zoom=.1)

dx, dy = calculate_gradient_of_image(img_night_cropped)
magnitude = calculate_magnitude(dx, dy)
orientation = calculate_orientation(dx, dy)
hog_night, _ = calculate_histograms(magnitude, orientation, cell_size=CELL_SIZE, num_bins=NUM_BINS)
plot_hog(hog_night, cell_size=CELL_SIZE, num_bins=NUM_BINS)
No description has been provided for this image
No description has been provided for this image
In [ ]:
match_and_plot_hogs(hog_night, img_night_cropped, image_hogs[2], cropped_images[2], cell_size=CELL_SIZE, threshold=0.7)
No description has been provided for this image

There are some keypoints that get matched (at the top of the tower) but other keypoints further down again fail to get matched. I would attribute this observation to the different angle of the image again.

Obstructing Objects¶

This subsection looks the effect that other objects can have on the matching process.

I expect if objects in the background have similar gradients/magnitudes they will also get falsely matched.

In [ ]:
img_city = cv2.imread(IMAGE_PATH + 'keypoint_matching/space_needle_city.jpg', cv2.IMREAD_GRAYSCALE)
img_city_cropped = resize_and_crop(img_city, 256, offset_x=170, offset_y=0, zoom=1)

dx, dy = calculate_gradient_of_image(img_city_cropped)
magnitude = calculate_magnitude(dx, dy)
orientation = calculate_orientation(dx, dy)
hog_city, _ = calculate_histograms(magnitude, orientation, cell_size=CELL_SIZE, num_bins=NUM_BINS)
plot_hog(hog_city, cell_size=CELL_SIZE, num_bins=NUM_BINS)
No description has been provided for this image
No description has been provided for this image
In [ ]:
match_and_plot_hogs(image_hogs[0], cropped_images[0], hog_city, img_city_cropped, cell_size=CELL_SIZE, threshold=.3)
No description has been provided for this image

The background also seems to have some similar histograms with the reference image. Depending on the background these matches could be even more faulty but in this case the skyscrapers of seattle are shaped more vertically and horizontally straight while the Space Needle has less edges along these vertical/horizontal angles.

How Image Size influences Performance¶

In this experiment I defined 5 different HOG parameters (test_params). These should give us a grasp on what parameters have a high impact on calculation efficiency.

In [ ]:
def test_performance(image, cell_size, num_bins):
    if len(image.shape) > 2:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    plt.imshow(image, cmap='gray')
    plt.title('Image to Test')
    plt.show()
    
    start_grad = time.time()
    dx, dy = calculate_gradient_of_image(image)
    end_grad = time.time()
    start_mag = time.time()
    magnitude = calculate_magnitude(dx, dy)
    end_mag = time.time()
    start_orient = time.time()
    orientation = calculate_orientation(dx, dy)
    end_orient = time.time()
    _, recorded_time = calculate_histograms(magnitude, orientation, cell_size=cell_size, num_bins=num_bins)
    end = time.time()
    
    print('--- HoG Calculation Stats ---')
    print(f'Cell Size: {cell_size}')
    print(f'Number of Bins: {num_bins}')
    print(f'Size of Image: {image.shape[0]}x{image.shape[1]} Pixels')
    print()
    print('Results:')
    print(f'Time to calculate gradient: {end_grad - start_grad:.2f} seconds')
    print(f'Time to calculate magnitude: {end_mag - start_mag:.2f} seconds')
    print(f'Time to calculate orientation: {end_orient - start_orient:.2f} seconds')
    print(f'Time to calculate HOG: {recorded_time:.2f} seconds')
    print(f'Total time: {end - start_grad:.2f} seconds')
In [ ]:
test_params = [
    [img_night, (8, 8), 9],
    [img_night, (4, 4), 9],
    [img_night, (2, 2), 9],
    [img_night, (8, 8), 12],
    [img_night, (8, 8), 18],
]

for params in test_params:
    test_performance(*params)
No description has been provided for this image
--- HoG Calculation Stats ---
Cell Size: (8, 8)
Number of Bins: 9
Size of Image: 5557x3705 Pixels

Results:
Time to calculate gradient: 0.04 seconds
Time to calculate magnitude: 0.12 seconds
Time to calculate orientation: 0.29 seconds
Time to calculate HOG: 10.98 seconds
Total time: 11.43 seconds
No description has been provided for this image
--- HoG Calculation Stats ---
Cell Size: (4, 4)
Number of Bins: 9
Size of Image: 5557x3705 Pixels

Results:
Time to calculate gradient: 0.05 seconds
Time to calculate magnitude: 0.11 seconds
Time to calculate orientation: 0.25 seconds
Time to calculate HOG: 41.48 seconds
Total time: 41.89 seconds
No description has been provided for this image
--- HoG Calculation Stats ---
Cell Size: (2, 2)
Number of Bins: 9
Size of Image: 5557x3705 Pixels

Results:
Time to calculate gradient: 0.05 seconds
Time to calculate magnitude: 0.11 seconds
Time to calculate orientation: 0.25 seconds
Time to calculate HOG: 176.19 seconds
Total time: 176.60 seconds
No description has been provided for this image
--- HoG Calculation Stats ---
Cell Size: (8, 8)
Number of Bins: 12
Size of Image: 5557x3705 Pixels

Results:
Time to calculate gradient: 0.04 seconds
Time to calculate magnitude: 0.10 seconds
Time to calculate orientation: 0.25 seconds
Time to calculate HOG: 11.28 seconds
Total time: 11.68 seconds
No description has been provided for this image
--- HoG Calculation Stats ---
Cell Size: (8, 8)
Number of Bins: 18
Size of Image: 5557x3705 Pixels

Results:
Time to calculate gradient: 0.05 seconds
Time to calculate magnitude: 0.08 seconds
Time to calculate orientation: 0.23 seconds
Time to calculate HOG: 11.25 seconds
Total time: 11.61 seconds

Some observations based on the experiment:

  • Cell Size is the biggest denominator when it comes to performance: The smaller the window, the more histograms have to be calculated
  • Number of bins only slightly impacts performance: More bins take longer but the difference is only minimal - This might be due to numpy's internal vectorization
  • Obviously the size of the image has the biggest impact as it creates the boundaries for the other parameters

Conclusion on HOG¶

The Histogram of Oriented Gradients (HOG) is effective for object detection in images due to its ability to capture edge and gradient structures indicative of shape. However, for keypoint matching—where precise identification and tracking of specific features are required - HOG falls short. It struggles with variations in viewpoint and rotation, and can confuse keypoints among similarly shaped objects. More sophisticated descriptors like SIFT or SURF are often preferred for their robustness to such variations and their ability to distinguish between features under a wide range of conditions.

GBSV Bear

Thanks for joining me on this little journey through the Pacific Northwest, I hope you were able to learn as much on the way as I did!

-- All storytelling images were generated with the help of DALL-E 3 by OpenAI